Point and interval estimates
NBIS, SciLifeLab
April 8, 2025
The sample estimate will be our best guess, a point estimate, of the population parameter.
The sample proportion and sample mean are unbiased estimates of the population proportion and population mean.
The expected value of an unbiased point estimate is the the population parameter that it estimates.
The sample estimate is our best guess, but it will not be without error.
To show the uncertainty an interval estimate for a population parameter can be computed based on sample data.
An interval estimate is an interval of possible values that with high probability contains the true population parameter.
The width of the interval estimate can be determined from the sampling distribution.
If the sampling distribution is unknown, a bootstrap interval can be computed instead.
Bootstrap is to use the data we have (our sample) and sample repeatedly with replacement from this sample.
Put the entire sample in an urn and resample!
Sample with replacement many times.
Plot the distribution of bootstrapped means.
For a 95% bootstrap interval, compute the 2.5 and 97.5 percentiles.
A confidence interval is a type of interval estimate associated with a confidence level.
An interval that with probability \(1 - \alpha\) cover the population parameter \(\theta\) is called a confidence interval for \(\theta\) with confidence level \(1 - \alpha\).
\[\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}})\]
\[\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}})\]
\[\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}})\]
\[\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}})\]
Based on a random sample compute the sample mean \(m\).
Use what is known about the sampleing distribution and compute a confidence interval around \(m\).
If \(\sigma\) is known
\[Z = \frac{\bar X - \mu}{SEM} = \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0, 1)\]
\[P\left(-z_{\alpha/2} < Z <z_{\alpha/2}\right) = 1-\alpha\]
\[P\left(-z_{\alpha/2} < Z <z_{\alpha/2}\right) = 1-\alpha\]
\(z_{\alpha/2}\) is the value such that \(P(Z \geq z_{\alpha/2}) = \frac{\alpha}{2} \iff P(Z \leq z_{\alpha/2}) = 1 - \frac{\alpha}{2}\).
For a 95% confidence, \(\alpha = 0.05\), and \(z_{\alpha/2} = 1.96\). For 90% or 99% confidence \(z_{0.05} = 1.64\) and \(z_{0.005}=2.58\).
If \(\sigma\) is known
\[Z = \frac{\bar X - \mu}{SEM} = \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0, 1)\] From the standard normal distribution we know;
\[P(-z_{\alpha/2}<Z<z_{\alpha/2}) = 1-\alpha\]
\[P(-z_{\alpha/2}<\frac{\bar X-\mu}{SEM}<z_{\alpha/2}) = 1-\alpha\]
\[P(\mu-z_{\alpha/2}SEM<\bar X<\mu+z_{\alpha/2}SEM) = 1-\alpha\]
\[P(\bar X-z_{\alpha/2}SEM<\mu<\bar X+z_{\alpha/2}SEM) = 1-\alpha\]
Replace with an observed sample mean, \(\bar x\).
\[P(\bar x_{obs}-z_{\alpha/2}SEM<\mu<\bar x_{obs}+z_{\alpha/2}SEM) = 1-\alpha\]
If \(\sigma\) is known
\[Z = \frac{\bar X - \mu}{SEM} = \frac{\bar X - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0, 1)\]
The confidence interval with confidence level \(1-\alpha\);
\[[\bar x_{obs} - z_{\alpha/2}SEM, \bar x_{obs} + z_{\alpha/2}SEM]\]
or
\[\mu = \bar x_{obs} \pm z_{\alpha/2}SEM\] where \(SEM = \frac{\sigma}{\sqrt{n}}\).
The mean of a sample of \(n\) independent and identically normal distributed observations \(X_i\) is normally distributed;
\[\bar X \sim N(\mu, \frac{\sigma}{\sqrt{n}})\]
If \(\sigma\) is unknown and \(n\) is small?
Use the statistic \(t=\frac{\bar X - \mu}{SEM} = \frac{\bar X - \mu}{\frac{s}{\sqrt{n}}} \sim t(n-1)\), t-distributed with \(n-1\) degrees of freedom.
It follows that
\[ \begin{aligned} P\left(-t < \frac{\bar X - \mu}{\frac{s}{\sqrt{n}}} < t\right) = 1 - \alpha \iff \\ P\left(\bar X - t \frac{s}{\sqrt{n}} < \mu < \bar X + t \frac{}{\sqrt{n}}\right) = 1 - \alpha \end{aligned} \]
The confidence interval;
\[[\bar x_{obs} - t \frac{s}{\sqrt{n}}, \bar x_{obs} + t \frac{s}{\sqrt{n}}]\]
or
\[\mu = \bar x_{obs} \pm t \frac{s}{\sqrt{n}}\]
The confidence interval with confidence level \(1-\alpha\) is thus;
\[\mu = \bar x_{obs} \pm t \frac{s}{\sqrt{n}}\]
For a 95% confidence interval and \(n=5\), \(t=\) 2.7764.
The \(t\) values for different values of \(\alpha\) and degrees of freedom are tabulated and can be computed in R using the function qt.
You study the BMI of male diabetic patients. In a sample of size 6 you observe; \(27, 25, 31, 29, 30, 22\). Assume that the BMI is normally distributed and calculate a 95% confidence interval for the mean BMI in male diabetic patients.
The sample mean is \(\bar x = 27.3\) and the sample standard deviation is \(s = 3.39\). The degrees of freedom is \(n-1 = 5\) and the \(t\) value for a 95% confidence interval is 2.5706.
The confidence interval is \(\bar x \pm t \frac{s}{\sqrt{n}} = 27.3 \pm 2.57 \frac{3.39}{\sqrt{6}} = 27.3 \pm 3.5\).
Remember that we can use the central limit theorem to show that
\[P \sim N\left(\pi, SE\right) \iff P \sim \left(\pi, \sqrt{\frac{\pi(1-\pi)}{n}}\right)\]
It follows that
\[Z = \frac{P - \pi}{SE} \sim N(0,1)\] Based on what we know of the standard normal distribution, we can compute an interval around the population property \(\pi\) such that the probability that a sample property \(p\) falls within this interval is \(1-\alpha\).
\[P\left(-z_{\alpha/2} < Z <z_{\alpha/2}\right) = 1-\alpha\\ P(-z_{\alpha/2} < \frac{P - \pi}{SE} < z_{\alpha/2}) = 1 - \alpha\]
We can rewrite this to
\[P\left(\pi-z_{\alpha/2} SE < P < \pi + z_{\alpha/2} SE\right) = 1-\alpha\] In words, a sample fraction \(p\) will fall between \(\pi \pm z_{\alpha/2} SE\) with probability \(1- \alpha\).
The equation can also be rewritten to
\[P\left(P-z SE < \pi < P + z SE\right) = 1 - \alpha\]
The observed confidence interval is what we get when we replace the random variable \(P\) with our observed fraction,
\[p-z SE < \pi < p + z SE\] \[\pi = p \pm z SE = p \pm z \sqrt{\frac{p(1-p)}{n}}\]
The 95% confidence interval \[\pi = p \pm 1.96 \sqrt{\frac{p(1-p)}{n}}\]
A 95% confidence interval will have 95% chance to cover the true value.
Back to our example of proportion pollen allergic in Uppsala. \(p=0.42\) and \(SE=\sqrt{\frac{p(1-p)}{n}} = 0.0494\).
Hence, the 95% confidence interval is \[\pi = 0.42 \pm 1.96 * 0.05 = 0.42 \pm 0.092\] or \[(0.42-0.092, 0.42+0.092) = (0.32, 0.52)\]